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Abstract — This paper presents an algorithm for generating 
scale-free networks with adjustable clustering coefficient. The 
algorithm is based on a random walk procedure combined with 
a triangle generation scheme which takes into account genetic 
factors; this way, preferential attachment and clustering control 
are implemented using only local information. Simulations are 
presented which support the validity of the scheme, characteriz- 
ing its tuning capabilities. 

I. Introduction 

Network modelling has become a very active research field 
after the discovery that many different complex systems share 
some essential common features which can be gathered in 
a network model |7|. Although network nodes and links can 
represent very different entities depending on the phenomenon 
being analyzed, still some common characteristics seem to be 
ubiquitous in many models. For instance, common patterns 
usually appear in social networks (111, ll24ll . L20J ). biology 
networks ( 1351 , lfT4ll ). technological networks ( ifTTl . 0) or 
information networks (|2|, 125 1 1^ Many of these features are 
non-trivial so that the traditional Erdos-Renyi (ER) model |9 1 
for random graphs is not sufficient to explain the behavior of 
these systems. 

The first feature appearing in real networks which was not 
gathered by the ER model was the small world effect, defined 
by two factors: slow increase of network diameter with net- 
work growth and the existence of a unexpectedly high number 
of triangles in the network (clustering). In order to mimic 
these properties. Watts and Strogatz proposed a new model in 
1341 . Nevertheless, this model could not represent an additional 
property also found in many real networks: the distribution of 
the number of neighbors (degree distribution) follows a power- 
law, which is very different from the distributions predicted by 
early models (e.g., exponential distribution in the ER model). 

In | 2 | the Barabasi- Albert (BA) network generation model 
was presented where network growing nature and preferential 
attachment were proved to be two essential features for 
obtaining scale-free networks which follow a power-law in 
the degree distribution. The preferential attachment stands 
for the fact that new vertices added to the network are 
attached preferentially to high-degree vertices. In case that the 

^This classification of networks according to its nature was proposed by 
Newman in 1 22J where a larger number of references for each type of network 
can be found. 



preference is linear, the probability to get connected to a given 
vertex is proportional to its degree. The BA model implements 
this preferential attachment using global network information 
to compute such probability: 

k- 

Pi = n ^ , n = total number of nodes. (1) 

The existence of a scale-free structure in many real networks 
has motivated the appearance of a number of new network 
models trying to reproduce at least one of the already men- 
tioned three main characteristics of real networks (clustering, 
long tail degree distribution, short diameter^ Different ap- 
proaches have been used: some models are based on a static 
network size 1231 , (61 , while others work on growing networks 

0, ca, El. 

In general, it is expected that the process of adding a new 
vertex in real world networks would not require the availability 
of such global information. Along this line, several authors 
have studied alternative local schemes (employing rules that 
only involve a vertex and its neighbors) to generate scale-free 
networks without the use of global parameters II19L |291, 1331 , 
I18J, Lill, L8J, I15J. 

Among them, the use of random-walkers to select node 
attachment in a network-growth algorithm has been suggested 
in 1 1 1 and successfully employed in |32J, li27J . 110 1 . In general, 
the use of the proposed schemes has been justified on the 
assumption that a random walk of arbitrary length / will 
end up on a vertex i of degree ki with probability given in 
equation ([T]), i.e., random walks are assumed to generate a pure 
preferential selection procedure (to be used as the basis of a 
preferential attachment scheme). The analytical characteriza- 
tion of these random walk models has been performed under 
some mean-field hypotheses, so that preferential attachment 
is studied but no other network features are considered. In 
|33 1 the correlation between clustering and degree is analysed 
also under the mean-field hypotheses, but the tuning of the 
clustering coefficient is not addressed. 

In this paper, alternative random walk selection schemes 
are presented which allow for the control of both preferential 

^Alternatively, some models try to mimic other network properties. For 
example, the goal of the model presented in 1 16 1 is to ensure that the network 
shows an arbitrary subgraph distribution. In 1311, the property to be reproduced 
is the existence of communities like the ones observed in real social networks; 
this could be considered as a generalization of clustering control. 
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attachment and clustering coefficient in the process of growing 
a scale-free network. The schemes depend on the transition 
probability distribution of the random walk in a manner 
that each path sample may have a different size due to a 
genetic factor. It is shown that the appropriate selection of 
the mentioned transition probability distribution allows for the 
tuning of the clustering coefficient of the generated network. 

The paper is organized as follows. Section |ll| starts pre- 
senting a motivation and the goals of the proposed model for 
network generation; then such model is described in detail. 



Table I 

Clustering coefficient in real networks, and in models of the 
networks of the same size 



In Section III simulation results supporting the validity of the 
model are presented. Finally, Concluding Remarks and Future 
Work lines are summarized in Section [iVl 

II. Model 

A. Motivation and goals 

Many real world networks are very complex systems gov- 
erned by several fundamental characteristics. So far, existing 
network models can only gather some of these features, which 
may or not be sufficient for the aim of the analysis. Hence, 
the construction of new more elaborated models addressing 
the emergence and behavior of additional characteristics is 
a relevant challenge. As mentioned in the Introduction, one 
big step in terms of explaining complex networks was the 
BA Model, where the emergence of a power-law in the 
degree distribution was explained via two simple assumptions: 
growing and preferential attachment (PA). The BA model 
has been a fundamental reference although it presents some 
limitations. 

On the one hand, several authors |10|, |27 | have pointed 
out the difficulties of a practical implementation of preferential 
attachment policies: as defined in the BA model, when a new 
node is about to join the network, it requires to know the 
degree of all nodes in the whole network in order to calculate 
the probability of linking to each existing node. This scheme 
does not seem to successfully explain the behavior of real-life 
stages, such as a blogger linking to a web page or a person 
making new friends (obviously, they do not have or do not 
employ global network structure information). A new model 
based only on local schemes was presented in |27|, suggesting 
that PA can be obtained from a random walk (at least in an 
approximate manner). In 1 10] this random walk based model 
is generalized so that the degree distribution can have an 
exponent different from 7 = 3 if a certain fraction py of edges 
is created purely at random (i.e, the new node is linked with a 
randomly chosen existing node, without implementing a walk). 

On the other hand, there is also one important feature 
which cannot be taken into account when employing the 
original BA model. Although this model performs better than 
Erdos-Renyi (ER) model concerning the degree distribution, 
it cannot produce the high clustering coefficient which has 
been observed in many real network^ (see table |l[ presenting 
results from II2TII for online social networks). The clustering 
coefficient of real networks is known to be higher than the one 

^Although there are no analytical results for the clustering coefficient in 
the BA model, it is known (11], |22|) that it decays with network size C ~ 
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C/Cba 


Flickr 


0.313 


412 


25.2 


LiveJournal 


0.330 


119.0 


17.8 


Orkut 


0.171 


7.24 


5.27 


Youtube 


0.136 


36.9 


69.4 



N- 



while in real networks C is independent from A^. 



provided by a purely random model, and its value depends on 
the nature of the network. It usually takes high values for social 
networks (for example C = 0.79 for imdb actor network 1341 ) 
but there are some networks which exhibit a power-law with 
a much lower clustering level (C = 0.011 for a P2P network 
1261 ). In |[T3l a mechanism for triangle formation was proposed 
which allowed the control of the clustering coefficient; such 
mechanism is furtherly developed and employed in this paper. 

In lfT2l some characteristics in the social networks which 
are related to genetic factors are presented. Concretely, it is 
shown that the clustering coefficient is one of those heritable 
network metrics. In fact, there are people who are very likely 
to introduce friends to any other friends, whereas some other 
people prefer to keep their friends apart from each other. 

In this paper, a new network growing scheme is presented 
where a triangle formation scheme is furtherly developed to 
include a genetic factor as the basis for a clustering control 
mechanism. This genetic factor in the nodes (known to happen 
in real networks) combines in a very adequate manner with a 
random walk based node selection procedure, so that only local 
information is employed in the whole edge addition process. 

B. Model description 

As mentioned before, the main aim of the model proposed in 
this paper is to generate scale-free networks whose clustering 
coefficient can be controlled by using only local information. 
The scale-free network is generated via a growing scheme 
which employs random walks as a local approximation to the 
preferential attachment criterion. The model proposed here is 
grounded on a modification of the model presented by Evans 
and Saramaki (ES model) 1,27 J . The ES model is defined as 
follows: 

• Initial condition: start with a network of no vertices. 

• Growth: each time step a vertex and m edges are added 
to the network. Note that m < uq. 

• Linking by a random walk (RW): the new vertex Vnew 
is joined with m existing vertices which are selected the 
following way: a random existing vertex Vs is chosen, 
then a /-step random walk starting from Vs is performed; 
the arrival vertex Ve obtained at the end of the walk is 
linked to Vnew 

Our model makes use of some random walk properties which 
happen to be very useful to control the appearance of triangles 
in the network. First of all, let us consider Vnew has been linked 
to a first selected network node Vs', obviously this link does 
not generate any triangle. Now it can be seen that a / = 1 walk 
starting from Vs will provide a new node that, if also linked 
to Vnew, will generate a triangle. This way, selecting nodes 
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1=1 1=2 

Figure 1 . The clustering coefficient is controlled by the following mechanism: 
if a / = 1 walk is carried out from the last joined vertex Vs, we ensure that 
a triangle is formed. However, if / = 2 we only generate a triangle if Vs and 
Ve are already connected. 

via successive / = 1 walks would add m — 1 triangles to the 
network. This fact suggests the possibility of implementing 
a triangle generation control scheme based on selecting new 
starting random points for the next walk (avoiding triangle for- 
mation), proportionally combined with successive / = 1 walks 
(forcing triangle formation). This approach was the first to be 
analysed: such control mechanism does not behave accurately 
when reproducing low levels of clustering coefficient. In fact, 
for a given clustering control parameter, a significant variance 
was observed, violating our design principle of fine clustering 
coefficient control (numerical results of this behavior will be 
showed in section [III]). 

Alternatively, random walks with / = 2 were employed. 
Note that if / = 2 is chosen almost no triangle will be added 
to the network. In fact, if the seed network with no vertices 
does not have any triad, it is straightforward to demonstrate 
that no triangle will be added (the probability of Vs and Ve 
being already connected will be zero). The implications of 
using / = 1 or / = 2 are illustrated in Figure [T] 

Therefore, controlling the number of / = 1 and / = 2 
random walks allows for an accurate tuning of the number 
of triangles added to the network. This might be implemented 
by just assigning pi probability to / = 1 walks and l—pi for 
the / = 2 case. However, we propose to include this control in 
a node attribute which is assigned at node's "birth", inspired 
by the already mentioned results in |12|. In our proposal, any 
vertex p is assigned a probability p{vi) which reflects the 
genetic factor mentioned there upon a certain distribution f{p). 
This probability remains constant during vertex life and will 
determine the length of the random walks starting from such 
node (i.e., a 1-step walk happens with probability p{vs), being 
Vs the last linked vertex; a 2- step walk is selected otherwise). 

Following the explained principles our algorithm is defined 
as follows: 

1) Start with a network of no vertices. Each vertex is 
assigned an attribute p{vi)^ i = 1 . . . no according to 
the random distribution f{p). 

2) A vertex Vs is chosen randomly. 

3) A random walk / > 1 is performed from Vg, randomly 
choosing at each step a neighbor of the current vertex. 
The arrival vertex Ve is marked. 

4) Start a new walk from the last marked vertex vi. With 
probability p{vi) this will be 1-step walk; otherwise will 



be a 2- step walk. Mark the arrival vertex. 

5) Repeat step 4 m — 1 times. Note that m < uq. 

6) Add one vertex to the network. Add m links between 
the new vertex and the m marked nodes. Assign to the 
new vertex a probability p{vi) according to f{p) |^ 

7) Repeat steps 2 to 6 (n — no times). 

Hence, the algorithm has the following design parameters: 
number of nodes n, number edges to be attached per vertex 
m, and a probability distribution f{p). Note that, as it is usual 
in growing network models, m allows to control the average 
degree since (k) = 2m. Concerning the distribution /(p), 
in order to simplify the interpretation of results, a binomial 
distribution has been chosen to illustrate the scheme in this 
paper; so there is a fraction cc of nodes with p{vi) = 1 having 
the rest of the nodes p{vi) = 0. It is expected that different 
distributions may lead to different community structures of the 
resulting networks. 

Another interesting issue is the selection of / > 1 for the 
first walk, since it is different from previous random walk 
based models. Although in |27| is stated (and supported by 
numerical simulations) that a walk of length 1 should be 
enough to produce a valid preferential attachment, we have 
found problems with some networks, where there is still a 
significant correlation between the neighbors average degree 
and how frequently a certain vertex is marked by a 1-step walk 
(in pure preferential attachment, the vertex selection criterion 
is not biased by the vertex neighbors degrees). 

III. Simulation Results 

A number of simulations have been carried out to check 
the performance of the proposed model along its two main 
goals: generating a power-law in the degree distribution and 
controlling the clustering coefficienj^ 

First of all, random walk models require the construction 
of an initial seed connected network with no vertices; it is 
important to point out the influence of the topology of such 
initial network on the final outcome. As it also happens in 
the BA model, a non-zero probability must be assigned to 
the initial isolated vertices (in most implementations this is 
done by setting a certain parameter a in the distribution so 
that p{k) ~ k -\- a). In order not to bias the first walk, 
a regular lattice must be chosen where the degree ko is 
equal for all no nodes (this is equivalent to the mentioned a 
parameter in the BA model). Besides, even if we start with a 
network accomplishing those requirements, the performance 

^The reason why any chosen vertex is first marked instead of directly linked 
to the new vertex (in step 3 or 4) is because it is desirable for the network to 
remain unchanged during the addition of the whole m edges. This procedure is 
common in many implementations of scale-free networks models such as BA, 
since the hypothesis of the network remaining unchanged during the vertex 
addition process is used in the mean-field equations model which supports 
that preferential attachment produces power-laws in degree distribution |22|. 
In random walk based models, this unchanged network hypothesis is even 
more important, since the addition of an edge to the last visited vertex can 
severely change the trajectory of the following random walk. 

{ ^^^i /c- > 1 

ki(ki-i) * , and C = 4f y^.- There 
ki<l ^ " 

is also a different clustering degree definition for the global graph in ^4) but 
it is not used in this paper. 
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Degree distribution 



Figure 2. An inappropriate election of no may drive to a "winner-takes-all" 
effect. In this simulation m = 1 and no = 2, produce a star-like graph where 
a power-law degree distribution cannot emerge by a growing process based 
on preferential attachment. 

Degree distribution 




Figure 3. Simulation with = 10*^, m = 2, using a full connected graph 
of no = 20 nodes as a seed. It clearly produces a "step-like" deviation from 
a power-law tail in the cumulative degree distribution. 



of the model might be affected by the values selected for 
no and ko. Precisely, if no is too small, there is a high 
probability of a so called "winner-takes-all" phenomenon to 
happen as it can be seen in Figure |2] On the other hand, 
choosing a large value for ko might also create some undesired 
effects. In this case, choosing a fully connected graph (i.e 
^0 = no — 1) might produce a deviation from the power-law 
behavior in the degree distribution as can be seen in Figure |3] 
To overcome this situation, a ring graph has been designed, 
with no = max(10,m), which seems to behave properly in 
most of the stages; hence, all the simulation results presented 
below make use of this seed. In addition, as pointed out in the 
model description, a length / = 7 for the first walk per added 
node has been used, in order to avoid the dependence on the 
neighbors degree during the selection process. 

A. Scale-free emergence 

Once a proper selection procedure of the initial seed has 
been settled, we now focus on the first design criterion: the 




Figure 4. A power-law is obtained in the degree distribution, very close 
to BA pure preferential attachment model. These results persist under any 
change in the clustering control parameter cc. 



emergence of a scale-free network during the growing process. 
In Figure |4j it can be seen that a power-law is produced 
(original BA mode|^ simulation is also included, in order to 
allow comparison). In addition, it is shown that this behavior 
does not depend on the value assigned to the clustering control 
parameter cc, as it can be seen in the representation of results 
for the two extreme values of this parameter. This power-law 
regime is independent from cc as well as from the size of the 
network n and the average degree {k) = 2m; hence, the model 
proposed here behaves like other preferential attachment based 
models |1, ifTOl . 

B. Clustering coefficient control 

As mentioned earlier, the control of the clustering coefficient 
is performed by changing the value of the probability cc that a 
node is assigned "1" length value in the binomial distribution 
characterizing genetic factors. We start by presenting the 
simulation results which correspond to the first (more intuitive) 
approach mentioned in Section [ll| such approach suggested 
the use of a new random starting vertex for avoiding triangle 
formation. The results show that this approach drives to a high 
level of variance for small values of cc as it can be seen 
in Figure |5] On the other hand, the second approach based 
on a 2- step mechanism produces a much better performance 
as shown in Figure [6] Both figures show the clustering 
coefficient dependence on networks with = 10"^ nodes and 
{k) =4. Mean clustering coefficient (blue points) and standard 
deviation (red bars), obtained for twenty runs for each value 
of cc, are also presented. 

Two additional tests were performed for the proposed model 
regarding its control capability of the clustering coefficient. 
The first test proves that the clustering coefficient remains 
constant for a given cc if the network keeps growing. This 
result is supported by Figure [7] where the degree is log-plot 
and the results from N = 1600 to A" = 50 • 10^ are presented. 

^For the BA model a = 2 was selected so that it did agree with our model 
where a ring (i.e ko = 2) is used as a seed. 
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Figure 1 



Clustering coefficient decay with average degree. 



Figure 5. Clustering coefficient control by the new random starting point 
strategy. The model shows a significant level of variance (red bars represent 
standard deviation for 20 runs) for small values of cc. 



0,1 0,2 0,3 



0,5 0,6 0,7 0,8 0,9 1 



Figure 6. Clustering coefficient control by the 2 -step walk strategy. In 
this case, the model shows a much better performance following a linear 
relationship characterized by C = 0.74986 • cc with = 0.99965167. 



10000 
N 



Figure 7. Clustering coefficient variation with network size N, for fixed 
m = 2. Results are presented for cc = (yellow), cc = 0.5 (orange) and 
cc = 1 (blue). 




Figure 9. Clustering coefficient control problem for high degree nodes 
(same simulation parameters as in Figure [6j. Nodes are divided into 3 groups 
according to their degree: the clustering control coefficient cc has smaller 
influence on the most connected nodes. 



The last simulation performed was intended to study the 
variation of the maximum clustering coefficient that can be 
generated by the model, that is cCmax, as m is increased. The 
results presented in Figure [8] show a decay of the clustering 
coefficient as m increases; this expected result shows a be- 
havior equivalent to other tunable clustering network models 
fl3i , 128 1 . In fact, some authors have proposed the use of an 
alternative clustering coefficient definition since large values 
of m do bias the degree-clustering correlation in scale-free 
networks when the standard definition of clustering coefficient 
is employed. The problem can be summarized as follows: very 
high-degree nodes (so called "hubs") have very few chances 
of having a high clustering coefficient; this is due to the fact 
that it would require most of their neighbors to be connected 
among themselves, producing a full graph around the hub, 
which does not fit in a scale-free structure. To support this 
intuition, a simulation has been performed, whose results are 
presented in Figure [9j showing that the hubs do not reach 
high values of clustering coefficient. A new formulation for 
the clustering coefficient to avoid this degree bias has been 
proposed in |30|. The performance of our model on this new 
definition has not been analysed yet. 

IV. Conclusions and future work 

In this paper, a new scheme for generating scale-free net- 
works with power law degree distribution and tunable cluster- 
ing coefficient has been presented. The scheme is grounded 
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on a combination of random walk and triangle generation 
procedures together with a genetic factor implementation. 
These elements allow for an accurate tuning of the clustering 
coefficient, making use of local information exclusively. As 
a consequence, this proposed scheme seems to explain the 
generation of real networks in a more realistic manner. The 
presented simulations support the validity of the scheme, 
characterizing its tuning capabilities. 

Further research is being carried out in several directions. 
On the one hand, the sensitivity of preferential attachment 
policies to the random walk length / is being analysed. 
On the other hand, some work is also being developed in 
reproducing additional network metrics by using different 
f{p) distributions for vertex characterization. An appropriate 
selection of f{p) can potentially drive to a network where 
not only average clustering coefficient is controlled, but also 
the whole clustering coefficient distribution over the network. 
The distribution f{p) could also be made to depend on some 
other network metrics (e.g., the degree, so that /(p, k)) in 
order to reproduce some correlations between network metrics 
observed in real networks. Finally, it is worth mentioning 
that generalizations of this model, based on a network growth 
driven exclusively by local interaction and intrinsic network 
attributes, can be implemented in different ways. For instance, 
some variants proposed in previous random walk models ifTOl 
can be easily incorporated to the network model presented 
here. 
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